Methods for constructing association maps of imaging data and biological data

ABSTRACT

A method for constructing an association map between imaging features and biological data is described. The method comprises combining one or more image features relating to a clinical subject with biological data and using an algorithm to make predictions based on the features and data.

STATEMENT REGARDING GOVERNMENT INTEREST

This work was supported in part by grant number 1 K08 AR050007 from the National Institute of Health. The U.S. Government has certain rights in the invention.

TECHNICAL FIELD

The subject matter described herein relates to methods for predicting disease risk, prognosis, and best treatment regimens in clinical subjects. The methods involve evaluating a subjects non-invasively obtained imaging features in view of an association map that correlates imaging features with biological data.

BACKGROUND

Scientists and clinicians routinely use non-invasive imaging to detail the physical and structural composition of living matter. Assessing the genetic and biochemical makeup of living tissue through non-invasive imaging is a desirable goal of current research. Recent development of genomic and proteomic methods have enabled molecular profiling of biological specimens by simultaneously revealing the expression level of thousands of genes and proteins. For example, gene expression patterns of cancer can reveal its etiology, prognosis, and therapeutic potential (Chung, C. H. et al., Nat. Genet., 32 Suppl.:533-540 (2002); Segal, E. et al., Nat. Genet., 37 Suppl.:S38-45 (2005); Chen, X. et al., Mol Biol Cell, 13:1929-1939 (2002)).

Current methods of molecular profiling often require invasive surgeries for tissue procurement and specialized equipment, thus limiting its routine use. In some cases, current profiling methods provide a single snap shot in time because they are destructive by nature in that cells must be disintegrated to extract nucleic acids or proteins for analysis. Another barrier to wide spread use of molecular profiling is that human tissues exhibit diverse distinctive features on noninvasive radiographic imaging, many of which currently have no known significance. Because imaging features of tissues reflect the dynamic and physiologic interplay of parenchymal cells, blood vessels, and stroma, it would be desirable if imaging features could be used to predict specific gene expression patterns in human diseases.

The foregoing examples of the related art and limitations related therewith are intended to be illustrative and not exclusive. Other limitations of the related art will become apparent to those of skill in the art upon a reading of the specification and a study of the drawings.

BRIEF SUMMARY

The following aspects and embodiments thereof described and illustrated below are meant to be exemplary and illustrative, not limiting in scope.

In one aspect, a method of constructing an association map between imaging features and biological data is provided, comprising:

identifying one or more imaging features from a plurality of images of a subject;

applying an algorithm to identify relationships between the one or more imaging features and biological data relating to the subject, wherein the identified relationships are used to construct an association map between the one or more imaging features and the biological data;

evaluating the statistical significance of the association map to test its predictive value.

In some embodiments, the features from a plurality of images of a subject are associated with a disease.

In some embodiments, the identifying comprises identifying one or more imaging features based on frequency of the one or more features in the plurality of images.

In some embodiments, the identifying comprises identifying one or more imaging features based on its independence from other features.

In some embodiments, the identifying comprises identifying one or more imaging features from images obtained using an imaging technique selected from the group consisting of computerized tomography imaging, magnetic resonance imaging (MRI), positron emission tomography (PET), ultrasonography (US), optical imaging, infrared imaging, and x-ray radiography. In particular embodiments, the imaging technique comprises the use of an imaging agent or image-enhancing agent.

In some embodiments, the applying comprises applying a module networks algorithm.

In some embodiments, the applying comprises applying an algorithm that applies an iterative Bayesian probabilistic procedure that identifies combinations of imaging features that relate to the biological data.

In some embodiments, the applying comprises applying an algorithm to gene expression data.

In some embodiments, the gene expression data is from a DNA microarray assay. In some embodiments, the gene expression data is from a cDNA microarray assay. In some embodiments, the gene expression data is from an RNA microarray assay.

In some embodiments, the applying comprises applying an algorithm to protein expression data.

In some embodiments, the evaluating the statistical significance of the association map comprises evaluating by comparison of the map with permuted data sets.

In some embodiments, the evaluating the statistical significance of the association map comprises evaluating by testing the prediction using an independent biological data set, independent images, or both.

In a related aspect, a method for predicting a gene or protein expression level in a biological sample is provided, comprising:

providing an image of the biological sample,

comparing the image to an association map as above to predict a gene or protein expression of the biological sample.

In some embodiments, the method further comprises, based on the predicting, providing a treatment prognosis of said patient based on the presence and/or absence of certain imaging features.

In some embodiments, the providing comprises providing a prediction of a patient's response to a drug. In some embodiments, the providing comprises providing a prediction of a patient's probable survival. In particular embodiments, the probable survival is disease free survival.

In some embodiments, the providing comprises providing a likelihood of disease recurrence.

In some embodiments, the providing comprises providing a likelihood of metastasis.

In another aspect, an association map constructed using the above method is provided.

In addition to the exemplary aspects and embodiments described above, further aspects and embodiments will become apparent by reference to the drawings and by study of the following descriptions.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A-1C are computerized tomography (CT) images of distinct features in human hepatocellular carcinomas (HCC), the features referred to as internal arteries (FIG. 1A), hypodense halo (FIG. 1B), and texture heterogeneity (FIG. 1C);

FIG. 1D illustrates a strategy for constructing an association map between imaging features and gene expression;

FIG. 2A shows an overview of an association map of imaging features and global gene expression, where each column is a sample; each row is a module. For each module, a decision tree of imaging features is associated with variation in the expression level of module genes. Knowledge of the imaging features thus allows an approximate reconstruction of the gene expression pattern.

FIG. 2B is a graph showing the cumulative fraction of gene expression variation across the full complement of gene activities that is predicted by the number of imaging features in the model.

FIG. 2C shows a matrix of modules, associated imaging features, and their enriched gene ontology annotations. Only modules and annotations with significant enrichment (false discovery rate <0.05 after accounting for multiple hypothesis testing) are shown.

FIGS. 3A-3C show molecular portraits of HCC from imaging features, where modules associated with HCC proliferation (FIG. 3A), liver synthetic function (FIG. 3B), and extracellular matrix remodeling (FIG. 3C) are shown; each column is a tumor sample; each row is a gene. Imaging features specifying each module are outlined on top; expression pattern of genes within the module as distinguished by imaging features are shown on bottom.

FIGS. 4A-4B show that imaging features predict venous invasion and survival, where a two-feature decision tree associated with a gene expression signature of venous invasion is shown to predict histologic venous invasion (FIG. 4A), and Kaplan-Meier survival curves of HCC patients with and without “internal arteries” imaging feature are shown in FIG. 4B.

FIG. 5 is a Table showing examples of image features.

DETAILED DESCRIPTION

In one aspect, a method is provided wherein an image or one or more imaging features is correlated to an association map of imaging features and biological data. The method finds use in various fields, including medical diagnostics and therapeutics. The methods have use in clinical subject/patient disease screening, diagnosis, characterization, and treatment selection.

The method is based on correlating biological data with associated imaging data, to construct a bidirectional association map, as will be illustrated below in Example 1. The biological data for construction of the association map can be obtained from a database or generated from patient biological samples. Databases of polynucleotide and protein expression data are well known. Such gene expression data can also be obtained, for example, using a DNA microarray that surveys the expression levels of thousands of genes simultaneously. For example, a 21-gene assay, termed Oncotype Dx, is a commercially available DNA microarray to determine prognosis and predict response of primary breast tumors to chemotherapy. A 70-gene signature known as Mammaprint is known for use in determining an adjuvant chemotherapeutic regimen in primary breast cancer. Gene expression signatures have also been identified to predict prognosis or therapeutic response in lung cancer, leukemia, and prostate cancer.

Data from any or all of these sources, preexisting or generated for the purpose of building an association map, are examples of biological data suitable for use in the method described herein. It will be appreciated that the gene expression data can be for any tissue source, such as cancerous tissue, tissue associated with a malignant or benign growth, infected tissue, inflamed tissue, and the like. Gene expression data may relate to expression levels, splicing patterns, gene copy number, chromosomal alterations (e.g., deletions, amplifications, inversions, and translocations), single nucleotide polymorphisms, and the like. Gene expression data include epigenetic data, e.g., relating to DNA methylation and histone modifications (e.g. acetylation, methylation, and ubiquitination). Gene expression data may be based on analyses of DNA, cDNA, mRNA, snRNA, iRNA, or other nucleic acids.

Biological data includes data based on protein-based analyses, including tissue protein expression profiles of different tissues (e.g. cancer, infected, inflamed, infected, etc). Particular examples include biological data from Serial Analysis of Gene Expression (SAGE), nuclear magnetic resonance, protein-interaction screens, chromatin immunoprecipitation-chips, isotope coded affinity tagging, activity based reagents, gel or chromatographic separation, RNAi screens, tissue arrays or mass spectrometry in which a large number of genes, proteins or metabolites are measured in a single experiment or assay is also contemplated. Biological data also include data from serological tests, EKGs, EEG, urinalysis, and other clinical and forensic analyses.

As noted above, the method combines the association map with imaging data. Such imaging data can be obtained from a wide variety of sources, including but not limited to magnetic resonance imaging (MRI), positron emission tomography (PET), computerized tomography (CT), ultrasonography (US), optical imaging, infrared imaging, and x-ray radiography. Imaging can be coupled with drugs or compounds, contrast agents or other agents or stimuli, or medical devices to elicit additional information from the imaging. Images are obtained using these modalities applied to a tissue sample, a lesion, an organism imaged in whole or in part.

In a general embodiment, the method of constructing an association map comprises providing a plurality of images of, for example, a tissue or a whole or part of an organism, such as a human subject, and biological data that has some relation to the images. For example, images of a solid tumor would preferably be accompanied by biological data based on the imaged solid tumor or on a like solid tumor. That is, images of tumors in the thyroid or images of infected tissue on a limb would have corresponding biological data from thyroid tumors or infected limb tissue, respectively. In a preferred embodiment, the image and the biological data derive from the same tissue or organism; however, a population of images and a population of biological data need not have a one-to-one correspondence.

An exemplary association map relating to human hepatocellular carcinoma is constructed by inspecting the imaging data and identifying distinctive features in the image. Examples of distinctive image features (or traits) for human hepatocellular carcinomas are shown in FIGS. 1A-1C, where computerized tomography (CT) images of features referred to as internal arteries (FIG. 1A), hypodense halo (FIG. 1B), and texture heterogeneity (FIG. 1C) were identified. As will be illustrated below (Example 1), the image or images may be scrutinized to extract certain features or features that inform gene expression. Such features include observations related to morphology, composition, structure, and/or physiology. Examples of distinct features that inform gene expression analyses include tissue necrosis, tissue heterogeneity, tumor margin score, internal septa, enhancement pattern, internal arteries, hypodense halo, wash-out, wash-in, texture heterogeneity, capsule, infiltration, and other imaging features familiar to artisans.

Such imaging features (and representative data) are associated with a unique image, imaging study, examination, subject or population, all of which are data relating to the image. Such image data independently or in combination define elements or components of the image, or the composite imaging appearance itself, which are included in the biological data used to construct an association map.

It will be appreciated that a single imaging feature may be sufficient to add value an association map; however more (and more detailed) features/data are generally preferred.

In some embodiments, the method of constructing an association map further includes one or both of (i) using an algorithm to identify relationships between one or more imaging features and the biological data and/or (ii) evaluating the statistical significance of the association map.

With respect to (i), algorithms that identify relationships between the imaging features and the biological data are known in the art, and such identified relationships form the basis for constructing an association map between such imaging features and biological data. For example, a module network algorithm is suitable for use (Segal, E. et al., Nat. Genet., 34:166-176 (2003)) wherein the algorithm identifies groups of genes, termed modules, which demonstrate coherent variation in expression across multiple samples. This algorithm further applies an iterative Bayesian probabilistic analysis and to identify combinations of imaging features that can predict the expression levels of gene modules.

As used herein, Bayesian probabilistic analysis refers broadly to a genus of related models and their derivatives. Multiple regression analysis and other analyses are known in the art. Classification algorithms such as neural networks, support vector machines, decision trees, Markov networks, and their derivatives may be applied. An exemplary analyses involves application of the Cox proportional hazard model. Other algorithms that can identify multi-way relationships may also be used.

With respect to (ii), evaluating the statistical significance of the association map ensures that the map is applicable to, and predictive for, images and/or biological data that was not used in the construction of the map. Such statistical analysis thereby provides a means to validate the association map as being generally applicable (i.e., generalizable) to other images and biological data.

For example, when two large biological data sets are compared, many apparent associations will occur by chance alone. These spurious associations are not useful, and in fact interfere with the identification of significant (i.e., “real” or “actually”) associations that have predictive value. Thus, a feature of some embodiments of the present method is confirmation of the statistical significance and predictive value of the association map.

Statistical significance can be evaluated in several ways, for example, by comparing the actual/observed association map with theoretical maps derived from modified/permuted data sets, e.g., where the imaging features and biological data have been scrambled. Observation of the same image feature-biological data association at equal frequency using such scrambled data, strongly suggests that the image feature or gene module is noisy and non-specific.

In addition, statistical significance and predictive value can be evaluated by cross-validation, also called leave-one-out analysis. This means that an association map is constructed on some fraction of the subject biological data or image features, and the resulting map is used to predict the outcome in the remaining patients in subjects not used to “train” the algorithm. In practice, half, ten percent, or a single individual can be left out as the test, and the procedure is iterated until each individual subject in the data set has been used both as the test and for training. Such iterative learning procedures may be a component of the module network algorithm, described above.

Finally, the most robust method for confirming statistical significance and predictive value is to test the association map against a completely independent set of subjects. Because the association map has not been trained on the new set of patients, the ability of the map to predict the outcomes in the test set provides strong evidence that the association map is generalizable—meaning that the map can be used to give diagnostic and prognostic information on most, if not all, future subjects.

An approach of constructing an association map is illustrated in Example 1 using expression data from imaging features on three phase contrast-enhanced CT and gene expression patterns of 28 human hepatocellular carcinomas (HCC). As will become apparent, global gene expression patterns of human cancers are encoded in their dynamic imaging features. In order to relate gene expression to imaging, distinctive features of from qualitative imaging were identified, and coherent patterns of variation from gene expression profiles were defined.

In another aspect, methods for using an association map constructed as described above, and as exemplified in Example 1, are provided. In one embodiment, the association map is used to guide treatment or provide a diagnosis of a subject. For example, an image of a tumor in a subject, such as a brain, breast, lung, prostate tumor, can be viewed in light of the association map to inform the clinician of the gene or protein expression of the patient. Knowledge of the gene or protein expression profile, i.e., molecular based information, about the patient informs the clinician about a patient's likely response to a drug, probability of relapse, survival rate, disease free survival, and the like. Such information will guide the treatment regimen, including the drug selection, dose, dosing regimen, and whether additional treatments should be considered, such as radiotherapy or tumor resection. Thus, a noninvasive image of a patient informs the clinician of molecular information useful in guiding treatment.

While the methods have been exemplified mainly using disease conditions, the methods can also be used for preventative medicine, in which case the biological data, with indeterminate image data, may suggest further imaging to be performed on a subject, e.g., to watch for likely diseases or conditions. This situation would arise, for example, when a subject was at risk for a disease, based on genetic data, lifestyle data, and laboratory tests but the presence of the disease could not be definitively shown by imaging or other methods.

Association maps are also suited for use in predicting subject outcome. Gene expression data or sequence variation patterns that predict treatment response to particular therapies are reported in the medical literature. For example, subjects with breast cancer that express particular cell surface receptors, such are HER2, are more responsive to certain chemotherapeutic agents than subjects that do not express certain cell surface receptors. Thus, an image of a tumor or other diseased tissue in a subject, viewed in light of an association map, can be used to predict response to a selected treatment.

It will also be appreciated that association maps can be constructed from images and biological data generated or gathered solely for this purpose, or another particular purpose. For example, images of patients that were not responsive to a particular drug and biological data from the subjects can be used to build an association map.

An association map between imaging and biological data can also be used to design a targeted therapeutic treatment regimen for a patient, providing a personalized care program. Based on an image of a tumor viewed in light of an association map for that tumor type, information about the gene and/or protein expression of the tumor can be determined. Understanding the tumor cell surface receptors permits selection of targeting agents, such as antibody fragments or other agents that have binding specificity for particular cell surface receptors, that can guide or direct a drug to the tumor cell. The targeting agent can be attached directly to the drug, or attached to a carrier for the drug, such as a liposome.

It will be appreciated that the method described herein can be accompanied, if desired, by additional clinical information for a patient, such as a

III. Examples

The following examples are illustrative in nature and are in no way intended to be limiting.

Materials and Methods

Imaging features/traits. One hundred thirty eight (138) distinct imaging features that were present in at least one tumor sample were defined and were scored across all tumor samples. Features were selected a priori based on intrinsic radiological interest (e.g., internal arteries and hypodense halos).

Features were also filtered based on their frequency and prominence in the data, inter-observer agreement and independence from other features based on Pearson correlation (cut off value of 0.9). Thirty-two (32) imaging features were used as input in the Bayesian model, and 28 of 32 were found to be informative of gene expression (FIG. 5).

Microarray data. Gene expression profiles of imaged HCCs were downloaded from Stanford Microarray Database, which is available via the Stanfor website. Data from array elements that had hybridization signal over background by 1.5 fold in both Cy5 and Cy3 channels and present in 70% of samples were centered by mean across samples. Data from replicate probes representing the same gene (as determined by Locuslink ID) were averaged. 6732 genes met these criteria for data quality and were used for subsequent analysis.

Module network. A module network procedure previously developed was applied (Segal, E. et al., Nat. Genet., 34:166-176 (2003)) to construct an association map between imaging features and gene expression profiles. The module network procedure takes as input a gene expression data and a set of potential regulatory input, and attempts to partition the expression data into distinct and mutually exclusive modules, such that the gene assigned to each module can be well predicted by a small decision tree of input regulatory inputs. The regulatory inputs were set to be the real-valued imaging features and were applied to the expression data described above. The 116 imaging networks can be interactively searched (Segal et al. (2007) Nat. Biotechnol. 25:675-80).

Module enrichment in Gene Ontology annotations. Significance of overlap between genes in modules and gene ontology annotations was calculated by comparison to the degree of overlap expected by chance alone using the hypergeometric distribution. Multiple hypothesis testing was accounted for by calculating a false discovery rate and present results with FDR<0.05.

Mapping venous invasion genes to imaging features. To find imaging features that correspond to the set of 91 genes associated with venous invasion, seven (7) modules that were significantly enriched for these gene were identified using the hypergeometric distribution as described above. The associated imaging feature trees of the 7 modules were analyzed (Table, below), and two features, internal arteries and halos, were found to be overrepresented among the top splits. To identify the consensus threshold of applying these features for this purpose, the p-value weighted average of the splits from the 7 image feature trees was calculated. The consensus thresholds were used for the imaging feature decision tree of FIG. 4A.

TABLE Venous Invasion Module Analysis Node Imaging Trait Level Frequency Module Internal Arteries, Density 1 4 595, 720, 651, 773 Hypodense Halo 1 2 479, 556 Tumor - Liver Difference, Minimum 1 1 697 Tumor Margin Score, Maximum 2 2 720, 773 Attenuation Heterogeneity, Maximum 2 2 595, 697 Internal Arteries, Rank 2 1 556 Internal Septa 2 1 651 Tumor Margin Score, Minumum 2 1 479 Tumor Margin Score, Minumum 3 3 773, 556, 720 Wash-out, Maximum 3 1 651 Necrosis, Density 3 1 595 Tumor Margin Score, Maximum 3 1 697 Attenuation Heterogeneity, Maximum 4 1 651 The position (node level) of each imaging feature/trait used to construct the decision trees used to predict the 7 venous invasion modules and their frequency of occurrence at this node level are displayed. Internal Arteries, followed by Hypodense Halos, are over-represented in the imaging networks occupying the top node level and frequency and were thus used to construct the venous invasion predictor.

Clinical data analysis. Microscopic venous invasion status on histologic analysis was available for 30 patients in the training set and 32 patients in the test set. Within each data set, patients were partitioned into two groups based on the two feature decision trees (“internal arteries” and “hypodense halos” on CT scan, FIG. 4A). Significance of association between the two feature imaging groups and histologic venous invasion was calculated using two-by-two contingency tables and chi square test. Overall survival data were available for 23 patients in the training set and 32 patients in the test set; only patients with clear surgical margin after HCC resection were used in this analysis. Within each data set, patients were partitioned based on the presence or absence of the “internal arteries” feature on CT scan, and survival analysis by the method of Kaplan and Meier for the two groups of patients was implemented in Winstat (R. Fitch Software, Bad Krozingen, DE).

Construction of Association Map. In this example, a three step strategy was used to create an “association map” between imaging features gene expression patterns. More particularly, an association map between imaging features on three phase contrast-enhanced CT and gene expression patterns of 28 human hepatocellular carcinomas (HCC; Chen, X. et al., Mol. Biol. Cell, 13:1929-1939 (2002)) was constructed, as shown in FIG. 1D. In the analysis, 138 distinctive imaging features present in one or more HCCs were defined and quantified. To identify informative features, features were filtered based on their frequency and prominence in the data, inter-observer agreement between two radiologists, and independence from other features as determined by Pearson correlation among the features (r=0.9). Thirty two imaging features were judged most promising by these criteria and used for subsequent analysis (FIG. 5). For instance, and with reference to FIGS. 1A-1C, channels of radio-dense signal within certain tumors on the arterial phase of the CT scan were noted, and this feature was termed “internal arteries”.

Next, a module networks algorithm (Segal, E. et al., Nat. Genet., 34:166-176 (2003)) was adopted to systematically search for associations between expression levels of 6732 well-measured genes determined by microarray analysis (Chen, X. et al., Mol Biol Cell, 13:1929-1939 (2002)) and combinations of imaging features. The algorithm identifies groups of genes, termed modules, which demonstrate coherent variation in expression across multiple samples. The algorithm further applies an iterative Bayesian probabilistic procedure to identify combinations of imaging features that can predict the expression levels of gene modules. An end result is identification of specific networks of imaging features that predict the expression level of gene modules. Each network of imaging features predicts the expression level of one gene module.

Next, statistical significance of the association map was validated by comparison with permuted data sets, and also by testing the prediction of the association map in an independent set of tumors.

The association map of imaging features and gene expression revealed that a surprisingly large fraction of the gene expression program can be reconstructed from a small number of imaging features, as seen in FIGS. 2A-2B. The expression variation in 6732 genes was captured by 116 gene modules, each of which was associated with specific combinations of imaging features. For each module, presence or absence of combinations of imaging features predicted the aggregate expression level of genes within the module (FIG. 2A). The combinations of relevant imaging features are depicted in decision trees: each split in the tree is specified by variation of an imaging feature; each terminal leaf in the tree is a cluster of samples that share similar expression pattern of module genes. Thus, the association map allowed one to predict the relative expression level of a gene (by mapping to a module) in a given HCC sample (by mapping to a cluster).

The hierarchical combination of only 28 imaging features was sufficient to predict the variation of all 116 gene modules. As shown in FIG. 2B, only nine features were sufficient to predict the expression patterns of 50% of the full complement of gene activities, and the prediction plateaus to above 80% of the full complement of gene activities with more than 23 features. For each gene, the number of features needed to predict its variation was on average three and no more than four in any instance. The association of imaging features and gene expression was highly significant by several independent statistical criteria. Specification of the entire module network involved 355 splits based on imaging features. The average gene expression levels between two sides of each split was significantly different in 299 of 355 splits (p<0.05 after applying the conservative Bonferroni correction), accounting for 5282 of 6732 input genes (78.5%). Comparison of the observed association map of imaging features and gene expression with maps derived from data sets with permuted sample labels confirmed that the predictive power of imaging features for expression patterns was highly unlikely due to chance alone. The log-likelihood was -18 per microarray, compared to only −23±0.1 expected by chance (10 permutations; p<10⁻⁵⁰). Thus, the variation in gene expression is densely encoded by a small number of imaging features. Once discovered, such “coding” image features can be quickly used to translate visual images into the underlying gene expression.

Using the association map, imaging features predictive of expression level of specific genes are directly revealed, and the potential physiologic significance of many imaging features can be inferred from their associated genes. The distribution of genes into modules defined by imaging features was not random, but was highly enriched for specific and diverse biological functions and processes. Comparison of gene membership in modules versus the published Gene Ontology annotation (Ashburner, M. et al., Nat. Genet., 25:25-29 (2000)) revealed significant overlaps, as shown in FIG. 2C, allowing many key physiologic properties of tumors to be gleaned from CT images. For example, three image features predicted the expression level of module 697 that is highly enriched in genes involved in cell proliferation, including PCNA, cyclin A, MCM5, MCM6, and geminin, as shown in FIG. 3A. In addition, expression level of VEGF, an important driver of tumor angiogenesis and target of the approved chemotherapy drug bevacizumab (Kerr, D. J., Nat. Clin. Pract. Oncol., 1:39-43 (2004)), co-varies with these cell cycle genes and is predicted by the same imaging features, as seen in FIG. 3A.

Thus, in one embodiment, the association provides a method for non-invasively delineating a molecularly distinct subset of tumors for a targeted therapeutic strategy. For example, the liver synthetic function of HCC patients is an important guide of disease severity (Thomas, M. B. et al., J. Clin. Oncol., 23:8093-8108 (2005)), and this information is evident in module 595, which details the expression level of albumin, pyruvate kinase, transferrin receptor 2, as well as revealing clotting function (thrombin, factor V, factor X), and detoxification activity (GSTO1, CYP27A1, epoxide hydroxylase), as seen in FIG. 3B.

It will also be appreciated that identity of genes in a module can reveal the physiologic basis of an imaging feature. The imaging feature “Tumor Margin Score, Minimum” denotes tumors that show an ill-defined transition zone between tumor and surrounding liver tissue. It was found that the presence of this feature was associated with elevated expression of a group of genes associated with extracellular matrix remodeling, such as MMP2, MMP7, COL3A1, COL6A2, and thrombospondin 1 and thrombospondin 2, as seen in FIG. 3C. Several of these genes, notably MMP2 (Giannelli, G. et al., Int. J. Cancer, 97:425-431 (2002); Qin, L. X. et al., World J. Gastroenterol., 8:385-392 (2002)) and thrombospondin (Qin, L. X. et al., World J. Gastroenterol., 8:385-392 (2002); Poon, R. T. et al., Clin. Cancer Res., 10:4150-4157 (2004)) are known to increase tumor invasiveness into surrounding stroma, which may lead to the poor demarcation of tumor margins on CT imaging.

The association map also enables systematic mapping of a predetermined group of genes to their corresponding imaging features. Expression variation in a group of 91 genes that was associated with microscopic venous invasion has been identified (Chen, X. et al., Mol Biol Cell, 13:1929-1939 (2002)), and is a well-established sign of poor prognosis (Thomas, M. B. et al., J. Clin. Oncol., 23:8093-8108 (2005)) that is extremely difficult to predict using conventional imaging methods in the absence of gross venous invasion. Here, the 91 genes in the “venous invasion signature” were enriched in 7 modules and associated with two predominant imaging features—the presence of “Internal Arteries” and absence of “Hypodense Halos”, as seen FIG. 4A and FIG. 5. Therefore, whether this pair of imaging features, as observed during the pre-operative CT scan, predicted the occurrence of microscopic venous invasion on histologic analysis was evaluated. In 30 patients with HCC, tumors with this combination of imaging features had a twelve-fold increased risk of microscopic venous invasion (p=0.004).

The predictive value of the two-feature predictor of venous invasion was validated in an independent set of 32 patients that were not used for training the association map (FIG. 4A, p=0.03). The presence of the feature “Internal Arteries” in the pre-operative CT scan of HCCs was a significant univariate predictor of overall survival in both groups of patients, as seen in FIG. 4B. Thus, the association map can identify novel imaging features corresponding to gene expression signatures and provide useful information to guide clinical decision making.

In summary, the global gene expression profiles of liver cancer are embodied in their imaging features. The systematic association between imaging features and gene expression allowed useful inference from both directions: on one hand, the association map identified biological processes, based on specific gene expression programs, which underlie specific imaging features. On the other hand, the association map enabled the use of imaging features to reconstruct the global gene expression programs of cancer, thereby creating a noninvasive “molecular portrait” of the tumor (FIGS. 3A-3C). The utility of this approach by identifying and validating a two-feature predictor of venous invasion in HCC (FIG. 4) was shown. Moreover, the “Internal Artery” feature that emerged from this analysis was a significant predictor of survival in two independent groups of patients. These results demonstrate that existing imaging technology may be used to reconstruct the molecular anatomy of disease, such as cancer, in a noninvasive fashion. The examples and data set forth herein using liver cancer as an exemplary disease illustrates the robustness of the method. Canonical association maps constructed from large representative series of tumors will enable routine noninvasive diagnosis of genetically heterogeneous tumors, reveal their prognosis, and allow serial profiling of tumors during therapy. This type of imaging based molecular profiling permits personalized medicine.

While a number of exemplary aspects and embodiments have been discussed above, those of skill in the art will recognize certain modifications, permutations, additions and sub-combinations thereof. It is therefore intended that the following appended claims and claims hereafter introduced are interpreted to include all such modifications, permutations, additions and sub-combinations as are within their true spirit and scope. 

1. A method of constructing an association map between imaging features and biological data, comprising: identifying one or more imaging features from a plurality of images of a subject; applying an algorithm to identify relationships between the one or more imaging features and biological data relating to the subject, wherein the identified relationships are used to construct an association map between the one or more imaging features and the biological data; evaluating the statistical significance of the association map to test its predictive value.
 2. The method of claim 1 wherein the features from a plurality of images of a subject are associated with a disease.
 3. The method of claim 1, wherein the identifying comprises identifying one or more imaging features based on frequency of the one or more features in the plurality of images.
 4. The method of claim 1, wherein the identifying comprises identifying one or more imaging features based on its independence from other features.
 5. The method of claim 1, wherein said identifying comprises identifying one or more imaging features from images obtained using an imaging technique selected from the group consisting of computerized tomography imaging, magnetic resonance imaging (MRI), positron emission tomography (PET), ultrasonography (US), optical imaging, infrared imaging, and x-ray radiography.
 6. The method of claim 5, wherein the imaging technique comprises the use of an imaging agent or image-enhancing agent.
 7. The method of claim 1, wherein said applying comprises applying a module networks algorithm.
 8. The method of claim 1, wherein said applying comprises applying an algorithm that applies an iterative Bayesian probabilistic procedure that identifies combinations of imaging features that relate to the biological data.
 9. The method of claim 1, wherein said applying comprises applying an algorithm to gene expression data.
 10. The method of claim 1, wherein the gene expression data is from a DNA microarray assay.
 11. The method of claim 1, wherein the gene expression data is from a cDNA microarray assay.
 12. The method of claim 1, wherein the gene expression data is from an RNA microarray assay.
 13. The method of claim 1, wherein the applying comprises applying an algorithm to protein expression data.
 14. The method of claim 1, wherein the evaluating the statistical significance of the association map comprises evaluating by comparison of the map with permuted data sets.
 15. The method of claim 1, wherein the evaluating the statistical significance of the association map comprises evaluating by testing the prediction using an independent biological data set, independent images, or both.
 16. A method for predicting a gene or protein expression level in a biological sample, comprising: providing an image of the biological sample, comparing the image to an association map constructed in accord with claim 1 to predict a gene or protein expression of the biological sample.
 17. The method of claim 16, further comprising, based on the predicting, providing a treatment prognosis of said patient based on the presence and/or absence of certain imaging features.
 18. The method of claim 16, wherein the providing comprises providing a prediction of a patient's response to a drug.
 19. The method of claim 16, wherein the providing comprises providing a prediction of a patient's probable survival.
 20. The method of claim 19, wherein the probable survival is disease free survival.
 21. The method of claim 16, wherein the providing comprises providing a likelihood of disease recurrence.
 22. The method of claim 16, wherein the providing comprises providing a likelihood of metastasis.
 23. An association map constructed using the method of claim
 1. 24. An association map constructed using the method of claim
 16. 